IPhraxtor - A linguistically informed system for extraction of term candidates
نویسندگان
چکیده
In this paper a method and a flexible tool for performing monolingual term extraction is presented, based on the use of syntactic analysis where information on parts-of-speech, syntactic functions and surface syntax tags can be utilised. The standard approaches to evaluating term extraction, namely by manual evaluation of the top n term candidates or by comparing to a gold standard consisting of a list of terms from a specific domain can have its advantages, but in this paper we try to realise a proposal by Bernier-Colborne (2012) where extracted terms are compared to a gold standard consisting of a test corpus where terms have been annotated in context. Apart from applying this evaluation to different configurations of the tool, practical experiences from using the tool for real world situations are described.
منابع مشابه
Multi-word Term Extraction for Bulgarian
The goal of this paper is to compile a method for multi-word term extraction, taking into account both the linguistic properties of Bulgarian terms and their statistical rates. The method relies on the extraction of term candidates matching given syntactic patterns followed by statistical (by means of Log-likelihood ratio) and linguistically (by means of inflectional clustering) based filtering...
متن کاملYou Can't Beat Frequency (Unless You Use Linguistic Knowledge) - A Qualitative Evaluation of Association Measures for Collocation and Term Extraction
In the past years, a number of lexical association measures have been studied to help extract new scientific terminology or general-language collocations. The implicit assumption of this research was that newly designed term measures involving more sophisticated statistical criteria would outperform simple counts of cooccurrence frequencies. We here explicitly test this assumption. By way of fo...
متن کامل$YWRPDWVNR OXãþHQMH L]UD]MD L] VORYHQVNR-angleških vzporednih besedil
The paper describes the design and structure of a Slovene-English term extraction system. Although the state-of-the-art systems operate on hybrid approaches using various levels of linguistic analysis, sometimes including semantic information, the aim here was to implement both statistical and linguistically motivated methods for both languages and compare the results. It is shown that some met...
متن کاملComputational Linguistics in the Translator’s Workflow—Combining Authoring Tools and Translation Memory Systems
In Technical Documentation, Authoring Tools are used to maintain a consistent text quality—especially with regard to the often followed translation of the original documents into several languages using a Translation Memory System. Hitherto these tools have often been used separately one after the other. Additionally Authoring tools often have no linguistic intelligence and thus the quality lev...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013